Edit Machines for Robust Multimodal Language Processing
نویسندگان
چکیده
Multimodal grammars provide an expressive formalism for multimodal integration and understanding. However, handcrafted multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent inputs. Spoken language (speech-only) understanding systems have addressed this issue of lack of robustness of hand-crafted grammars by exploiting classification techniques to extract fillers of a frame representation. In this paper, we illustrate the limitations of such classification approaches for multimodal integration and understanding and present an approach based on edit machines that combine the expressiveness of multimodal grammars with the robustness of stochastic language models of speech recognition. We also present an approach where the edit operations are trained from data using a noisy channel model paradigm. We evaluate and compare the performance of the hand-crafted and learned edit machines in the context of a multimodal conversational system (MATCH).
منابع مشابه
Articles: Robust Understanding in Multimodal Interfaces
Multimodal grammars provide an effective mechanism for quickly creating integration and understanding capabilities for interactive systems supporting simultaneous use of multiple input modalities. However, like other approaches based on hand-crafted grammars, multimodal grammars can be brittle with respect to unexpected, erroneous, or disfluent input. In this article, we show how the finite-sta...
متن کاملProbabilistic Finite State Machines for Regression-based MT Evaluation
Accurate and robust metrics for automatic evaluation are key to the development of statistical machine translation (MT) systems. We first introduce a new regression model that uses a probabilistic finite state machine (pFSM) to compute weighted edit distance as predictions of translation quality. We also propose a novel pushdown automaton extension of the pFSM model for modeling word swapping a...
متن کاملA Statistical Approach to Multimodal Natural Language Interaction
The Human-Centric Word Processor is a research prototype that allows users to create, edit and manage documents. Users can use real-time continuous speech recognition to dictate the contents of a document. Speech recognition is coupled with pen or mouse based input to facilitate all aspects of the command and control of the application. The system is multimodal, allowing the user to point and s...
متن کاملMultimodal signal processing in naturalistic noisy environments
When a system must process spoken language in natural environments that involve different types and levels of noise, the problem of supporting robust recognition is a very difficult one. In the present studies, over 2,600 multimodal utterances were collected during both mobile and stationary use of a multimodal pen/voice system. The results confirmed that multimodal signal processing supports s...
متن کاملOpenMM: An Open-Source Multimodal Feature Extraction Tool
The primary use of speech is in face-to-face interactions and situational context and human behavior therefore intrinsically shape and affect communication. In order to usefully model situational awareness, machines must have access to the same streams of information humans have access to. In other words, we need to provide machines with features that represent each communicative modality: face...
متن کامل